[BUG] Incorrect expression naming for struct get #1832
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The reason why
select
from #1810 wasn't working was because there was a discrepancy between thename
element in theField
struct of the struct get expression and thename()
method of the expression.This PR is a quick fix to that behavior for that specific expression, but ideally we would want to consolidate the logic between
Expr::to_field
andExpr::name
so that there aren't two code paths for getting the name of an expression. The problem behind that isExpr::to_field
takes in a schema, which is not always easily obtainable during the calls we make in the code toExpr::name
. Notably, Python code in the physical plan and execution step generally do not use the schema, but do use expression names. Thus always requiring a schema forExpr::name
would not work.This raises the question: would a schema ever affect the name of an expression? The conclusion from a conversation with @clarkzinzow and @jaychia was no. Thus, would it be reasonable to separate the name from an expression's field, so that
Expr::name
was the single source of truth?Moreover, in the future we would like to do table-scoped columns so that we are able to construct expressions that take data from multiple tables. Once we do that, it should be possible to derive the dtype of an expression by determining the dtypes of the columns or literals which are leaves in the expression tree. Then, we wouldn't even need to pass around a schema altogether?
Anyway, lots to think about. We'll leave this as tech debt that should be reworked when implement table-scoped columns, since that will likely require rethinking a lot about schemas and expressions as well.